Investors are planning to open a foodservice place in Moscow.
Research objective:
prepare a study of the Moscow foodservice market, find interesting features and present the results obtained, which will help in choosing a suitable place for investors.
Research tasks:
Give recommendations on:
Available information:
a dataset with foodservice places in Moscow, compiled on the basis of data from Yandex Maps and Yandex Business services for the summer of 2022. The information posted in the Yandex Business service could have been added by users or found in publicly available sources. It is purely for reference purposes.
The research process:
# importing the libraries
import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from folium import Map, Choropleth, Marker
from folium.plugins import MarkerCluster
# saving the dataset to the 'data' variable
pth = 'mos_places.csv'
if os.path.exists(pth):
data = pd.read_csv(pth, sep=',')
else:
print('Problem with the file')
# saving a copy of the original dataset
df = data.copy()
# the first five rows of the dataset
data.head()
| name | category | address | district | hours | lat | lng | rating | price | avg_bill | middle_avg_bill | middle_coffee_cup | chain | seats | district_eng | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | WoWфли | cafe | Москва, улица Дыбенко, 7/1 | Северный административный округ | ежедневно, 10:00–22:00 | 55.878494 | 37.478860 | 5.0 | NaN | NaN | NaN | NaN | 0 | NaN | Northern Administrative District |
| 1 | Четыре комнаты | restaurant | Москва, улица Дыбенко, 36, корп. 1 | Северный административный округ | ежедневно, 10:00–22:00 | 55.875801 | 37.484479 | 4.5 | выше среднего | Средний счёт:1500–1600 ₽ | 1550.0 | NaN | 0 | 4.0 | Northern Administrative District |
| 2 | Хазри | cafe | Москва, Клязьминская улица, 15 | Северный административный округ | пн-чт 11:00–02:00; пт,сб 11:00–05:00; вс 11:00... | 55.889146 | 37.525901 | 4.6 | средние | Средний счёт:от 1000 ₽ | 1000.0 | NaN | 0 | 45.0 | Northern Administrative District |
| 3 | Dormouse Coffee Shop | coffee shop | Москва, улица Маршала Федоренко, 12 | Северный административный округ | ежедневно, 09:00–22:00 | 55.881608 | 37.488860 | 5.0 | NaN | Цена чашки капучино:155–185 ₽ | NaN | 170.0 | 0 | NaN | Northern Administrative District |
| 4 | Иль Марко | pizzeria | Москва, Правобережная улица, 1Б | Северный административный округ | ежедневно, 10:00–22:00 | 55.881166 | 37.449357 | 5.0 | средние | Средний счёт:400–600 ₽ | 500.0 | NaN | 1 | 148.0 | Northern Administrative District |
# checking the number of rows in the dataset
print("Total places in the dataset:", data.shape[0])
Total places in the dataset: 8406
# basic information
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 8406 entries, 0 to 8405 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 8406 non-null object 1 category 8406 non-null object 2 address 8406 non-null object 3 district 8406 non-null object 4 hours 7870 non-null object 5 lat 8406 non-null float64 6 lng 8406 non-null float64 7 rating 8406 non-null float64 8 price 3315 non-null object 9 avg_bill 3816 non-null object 10 middle_avg_bill 3149 non-null float64 11 middle_coffee_cup 535 non-null float64 12 chain 8406 non-null int64 13 seats 4795 non-null float64 14 district_eng 8406 non-null object dtypes: float64(6), int64(1), object(8) memory usage: 985.2+ KB
The dataset contains information about 8,406 places. It is clear from the first lines that there are missing values.
There are fourteen columns in the dataset in total:
name — place's name;category — category;address — address;district — district of the city;hours — opening hours;lat — latitude of a geographical point;lng — longitude of a geographical point;rating — place rating according to user ratings in Yandex Maps;price — price category;avg_bill — the range of the average bill;middle_avg_bill — average bill;middle_coffee_cup — the cost of a cup of cappuccino;chain — chain or not;seats — number of seats.The columns name, category, address, district, hours, price, avg_bill have object type; lat, lng, rating, middle_avg_bill, middle_coffee_cup, seats have float64 type; chain has int64 type.
We can say there is enough data for analysis. We will prepare them for further work.
data.duplicated().sum()
0
There are no duplicates.
# distribution of values in the dataset
data.describe()
| lat | lng | rating | middle_avg_bill | middle_coffee_cup | chain | seats | |
|---|---|---|---|---|---|---|---|
| count | 8406.000000 | 8406.000000 | 8406.000000 | 3149.000000 | 535.000000 | 8406.000000 | 4795.000000 |
| mean | 55.750109 | 37.608570 | 4.229895 | 958.053668 | 174.721495 | 0.381275 | 108.421689 |
| std | 0.069658 | 0.098597 | 0.470348 | 1009.732845 | 88.951103 | 0.485729 | 122.833396 |
| min | 55.573942 | 37.355651 | 1.000000 | 0.000000 | 60.000000 | 0.000000 | 0.000000 |
| 25% | 55.705155 | 37.538583 | 4.100000 | 375.000000 | 124.500000 | 0.000000 | 40.000000 |
| 50% | 55.753425 | 37.605246 | 4.300000 | 750.000000 | 169.000000 | 0.000000 | 75.000000 |
| 75% | 55.795041 | 37.664792 | 4.400000 | 1250.000000 | 225.000000 | 1.000000 | 140.000000 |
| max | 55.928943 | 37.874466 | 5.000000 | 35000.000000 | 1568.000000 | 1.000000 | 1288.000000 |
The minimum value of the average bill looks abnormal, the maximum may be explained by the fact that there are restaurants in the dataset focused on banquets. The maximum cost of a cappuccino cup and the number of seats also look abnormal.
Checking the minimum value in the column middle_avg_bill.
# filter out the rows in which the value of middle_avg_bill is 0
data.query('middle_avg_bill == 0')
| name | category | address | district | hours | lat | lng | rating | price | avg_bill | middle_avg_bill | middle_coffee_cup | chain | seats | district_eng | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3688 | Кофемания | coffee shop | Москва, улица Новый Арбат, 19 | Центральный административный округ | ежедневно, круглосуточно | 55.752136 | 37.587784 | 4.5 | высокие | Средний счёт:от 0 ₽ | 0.0 | NaN | 1 | 200.0 | Central Administrative District |
The owners of "Coffeemania" on Novy Arbat str. did not specify the average bill. Let's look at the median average score in other cafes of this chain in the Central Administrative District of Moscow and replace the value with the median.
# the median average bill in the "Coffeemania" in the Central Administrative District of Moscow
print(data.query('name == "Кофемания" and district_eng == "Central Administrative District"')['middle_avg_bill'].median())
# replace the missing value with the found one
data.loc[3688, 'middle_avg_bill'] = data.query('name == "Кофемания" and district_eng == "Central Administrative District"')['middle_avg_bill'].median()
2000.0
We look at the 95th and 99th percentiles of the cost of a cup of cappuccino.
print(np.percentile(data.query('middle_coffee_cup.notna()', engine="python")['middle_coffee_cup'], [95, 99]))
[275. 309.9]
Coffee shops with the most expensive coffee.
# filter out the places where the cost of a cup of cappuccino is more than 310 rubles
data.query('middle_coffee_cup > 310')
| name | category | address | district | hours | lat | lng | rating | price | avg_bill | middle_avg_bill | middle_coffee_cup | chain | seats | district_eng | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1769 | Шоколадница | coffee shop | Москва, Бутырская улица, 95 | Северный административный округ | пн-чт 07:30–23:00; пт 07:30–00:00; сб круглосу... | 55.807331 | 37.580716 | 4.1 | средние | Цена чашки капучино:300–350 ₽ | NaN | 325.0 | 1 | 61.0 | Northern Administrative District |
| 2075 | Il Tocco | coffee shop | Москва, Ходынский бульвар, 11А | Северный административный округ | пн-пт 08:00–21:00; сб,вс 09:00–22:00 | 55.785455 | 37.530359 | 4.1 | средние | Цена чашки капучино:280–350 ₽ | NaN | 315.0 | 0 | NaN | Northern Administrative District |
| 2859 | Шоколадница | coffee shop | Москва, Большая Семёновская улица, 27, корп. 1 | Восточный административный округ | ежедневно, 08:00–23:00 | 55.782268 | 37.709022 | 4.2 | средние | Цена чашки капучино:230–2907 ₽ | NaN | 1568.0 | 1 | 48.0 | Eastern Administrative District |
| 3718 | Кафетериус | coffee shop | Москва, Большая Никитская улица, 35 | Центральный административный округ | пн-пт 08:00–22:00; сб,вс 10:00–22:00 | 55.757292 | 37.595033 | 4.3 | средние | Цена чашки капучино:279–378 ₽ | NaN | 328.0 | 1 | 30.0 | Central Administrative District |
| 5503 | Coffee FM | coffee shop | Москва, Авиамоторная улица, 10, корп. 1 | Юго-Восточный административный округ | пн-пт 08:00–21:00; сб,вс 09:00–19:00 | 55.754233 | 37.715491 | 4.3 | NaN | Цена чашки капучино:250–500 ₽ | NaN | 375.0 | 0 | 190.0 | South-Eastern Administrative District |
| 5990 | Диемм | coffee shop | Москва, 3-я Фрунзенская улица, 1 | Центральный административный округ | ежедневно, 08:30–23:00 | 55.719000 | 37.582203 | 4.3 | средние | Цена чашки капучино:250–390 ₽ | NaN | 320.0 | 0 | 30.0 | Central Administrative District |
Incorrect value in the "Shokoladnitsa": most likely due to a typo in the maximum cost of a cup of coffee. Let's look at the median average score in other places of this chain in the Eastern Administrative District of Moscow and replace the value with the median.
# the median average bill in the "Shokoladnitsa" chain in Eastern Administrative District of Moscow
print(data.query('name == "Шоколадница" and district_eng == "Eastern Administrative District"')['middle_coffee_cup'].median())
# replace the missing value with the found one
data.loc[2859, 'middle_coffee_cup'] = data.query('name == "Шоколадница" and district_eng == "Eastern Administrative District"')['middle_coffee_cup'].median()
256.0
Let's look at the 95th and 99th percentiles of the number of seats and at the places where there are more seats than the 99th percentile.
print(np.percentile(data.query('seats.notna()', engine="python")['seats'], [95, 99]))
[307. 625.]
# filter out establishments where the number of seats is more than 625
data.query('seats > 625')
| name | category | address | district | hours | lat | lng | rating | price | avg_bill | middle_avg_bill | middle_coffee_cup | chain | seats | district_eng | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2713 | Ваня и Гоги | bar, pub | Москва, Измайловское шоссе, 71, корп. А | Восточный административный округ | ежедневно, 11:00–06:00 | 55.789842 | 37.750282 | 4.2 | высокие | Средний счёт:1000–2500 ₽ | 1750.0 | NaN | 0 | 1040.0 | Eastern Administrative District |
| 2722 | Маргарита | fast food | Москва, Измайловское шоссе, 71, корп. А | Восточный административный округ | ежедневно, 10:00–22:00 | 55.789527 | 37.752004 | 4.3 | NaN | NaN | NaN | NaN | 1 | 1040.0 | Eastern Administrative District |
| 2770 | Шоколадница | coffee shop | Москва, Измайловское шоссе, 71, корп. А | Восточный административный округ | пн-ср 11:00–23:00; чт 11:00–00:00; пт,сб 11:00... | 55.789903 | 37.749822 | 4.1 | NaN | NaN | NaN | NaN | 1 | 1040.0 | Eastern Administrative District |
| 2901 | Ресторан Тройка | bar, pub | Москва, Измайловское шоссе, 71, корп. 2Б | Восточный административный округ | ежедневно, 11:00–23:00 | 55.789276 | 37.747832 | 3.7 | NaN | NaN | NaN | NaN | 0 | 660.0 | Eastern Administrative District |
| 2913 | Хаус бар | bar, pub | Москва, Измайловское шоссе, 71, корп. 2Б | Восточный административный округ | ежедневно, 07:00–23:00 | 55.789639 | 37.747274 | 3.7 | средние | Средний счёт:500–1000 ₽ | 750.0 | NaN | 0 | 660.0 | Eastern Administrative District |
| 2966 | Матрешка | cafe | Москва, Измайловское шоссе, 71, корп. А | Восточный административный округ | NaN | 55.789867 | 37.749656 | 4.0 | NaN | NaN | NaN | NaN | 0 | 1040.0 | Eastern Administrative District |
| 4180 | Eataly | bar, pub | Москва, Киевская улица, 2 | Западный административный округ | ежедневно, 12:00–23:00 | 55.743405 | 37.562535 | 4.6 | NaN | NaN | NaN | NaN | 0 | 920.0 | Western Administrative District |
| 4231 | РестоБар Argomento | cafeteria | Москва, Кутузовский проспект, 41, стр. 1 | Западный административный округ | ежедневно, 12:00–23:00 | 55.738237 | 37.531819 | 4.2 | высокие | Средний счёт:2500–5000 ₽ | 3750.0 | NaN | 0 | 1200.0 | Western Administrative District |
| 4245 | Стейк & Бургер | cafe | Москва, Киевская улица, 2 | Западный административный округ | ежедневно, 09:00–21:00 | 55.742953 | 37.561872 | 4.2 | NaN | NaN | NaN | NaN | 1 | 920.0 | Western Administrative District |
| 5486 | Дом | cafe | Москва, улица Юности, 1 | Восточный административный округ | NaN | 55.736204 | 37.815500 | 4.9 | NaN | NaN | NaN | NaN | 0 | 760.0 | Eastern Administrative District |
| 5655 | The Fox Pub | bar, pub | Москва, Мичуринский проспект, 22, корп. 1 | Западный административный округ | пн-чт 12:00–00:00; пт,сб 12:00–02:00; вс 12:00... | 55.701625 | 37.504137 | 4.7 | NaN | NaN | NaN | NaN | 0 | 650.0 | Western Administrative District |
| 5720 | Ресторан китайской кухни Чуаньюй | restaurant | Москва, Мичуринский проспект, 7, корп. 1 | Западный административный округ | ежедневно, 11:00–23:00 | 55.700431 | 37.512799 | 4.4 | средние | NaN | NaN | NaN | 0 | 650.0 | Western Administrative District |
| 5738 | Университетское | coffee shop | Москва, Мичуринский проспект, 8, стр. 1 | Западный административный округ | ежедневно, 07:00–20:00 | 55.705561 | 37.511163 | 4.3 | NaN | NaN | NaN | NaN | 0 | 650.0 | Western Administrative District |
| 5758 | Шоколадница | coffee shop | Москва, Мичуринский проспект, 22, корп. 1 | Западный административный округ | ежедневно, 08:00–23:00 | 55.701211 | 37.503986 | 4.2 | средние | Цена чашки капучино:239–274 ₽ | NaN | 256.0 | 1 | 650.0 | Western Administrative District |
| 5835 | Lyanson’s coffee | cafe | Москва, Мичуринский проспект, 27, корп. 1 | Западный административный округ | пн-пт 08:00–20:00 | 55.697630 | 37.501684 | 3.8 | NaN | NaN | NaN | NaN | 0 | 650.0 | Western Administrative District |
| 5841 | For Your Kids | cafe | Москва, Мичуринский проспект, 58, корп. 1 | Западный административный округ | NaN | 55.691382 | 37.486793 | 3.3 | NaN | NaN | NaN | NaN | 0 | 650.0 | Western Administrative District |
| 6518 | DelonixCafe | restaurant | Москва, проспект Вернадского, 94, корп. 1 | Западный административный округ | ежедневно, круглосуточно | 55.652577 | 37.475730 | 4.1 | высокие | Средний счёт:1500–2000 ₽ | 1750.0 | NaN | 0 | 1288.0 | Western Administrative District |
| 6524 | Ян Примус | restaurant | Москва, проспект Вернадского, 121, корп. 1 | Западный административный округ | пн-чт 12:00–00:00; пт,сб 12:00–02:00; вс 12:00... | 55.657166 | 37.481519 | 4.5 | выше среднего | Средний счёт:1500 ₽ | 1500.0 | NaN | 1 | 1288.0 | Western Administrative District |
| 6548 | Vibes cafe | cafe | Москва, улица Миклухо-Маклая, 6 | Юго-Западный административный округ | пн-пт 09:00–20:00; сб 09:00–16:00 | 55.651882 | 37.499295 | 4.4 | NaN | NaN | NaN | NaN | 0 | 644.0 | South-Western Administrative District |
| 6574 | Мюнгер | pizzeria | Москва, проспект Вернадского, 97, корп. 1 | Западный административный округ | пн-пт 08:00–21:00; сб,вс 10:00–21:00 | 55.667505 | 37.491001 | 4.8 | NaN | NaN | NaN | NaN | 1 | 1288.0 | Western Administrative District |
| 6641 | One Price Coffee | coffee shop | Москва, проспект Вернадского, 84, стр. 1 | Западный административный округ | ежедневно, 08:30–20:00 | 55.665129 | 37.478635 | 4.3 | NaN | NaN | NaN | NaN | 1 | 1288.0 | Western Administrative District |
| 6658 | ГудБар | bar, pub | Москва, проспект Вернадского, 97, корп. 1 | Западный административный округ | пн-пт 11:00–23:00; сб,вс 13:00–23:00 | 55.667327 | 37.490601 | 4.1 | средние | Средний счёт:700 ₽ | 700.0 | NaN | 0 | 1288.0 | Western Administrative District |
| 6684 | Пивной ресторан | bar, pub | Москва, проспект Вернадского, 121, корп. 1 | Западный административный округ | NaN | 55.657133 | 37.481508 | 4.5 | NaN | NaN | NaN | NaN | 0 | 1288.0 | Western Administrative District |
| 6690 | Японская кухня | restaurant | Москва, проспект Вернадского, 121, корп. 1 | Западный административный округ | NaN | 55.657255 | 37.481547 | 4.4 | NaN | NaN | NaN | NaN | 1 | 1288.0 | Western Administrative District |
| 6696 | Кабул | restaurant | Москва, улица Миклухо-Маклая, 6 | Юго-Западный административный округ | пн-пт 07:00–21:00; сб 07:00–18:00 | 55.652025 | 37.498719 | 4.3 | NaN | NaN | NaN | NaN | 1 | 644.0 | South-Western Administrative District |
| 6771 | Точка | cafe | Москва, проспект Вернадского, 84, стр. 1 | Западный административный округ | NaN | 55.665634 | 37.477830 | 4.7 | NaN | NaN | NaN | NaN | 1 | 1288.0 | Western Administrative District |
| 6807 | Loft-cafe академия | cafe | Москва, проспект Вернадского, 84, стр. 1 | Западный административный округ | пн-пт 09:00–20:00; сб 09:00–16:00 | 55.665142 | 37.478603 | 3.6 | NaN | NaN | NaN | NaN | 0 | 1288.0 | Western Administrative District |
| 6808 | Яндекс Лавка | restaurant | Москва, проспект Вернадского, 51, стр. 1 | Западный административный округ | ежедневно, круглосуточно | 55.672580 | 37.507753 | 4.0 | NaN | NaN | NaN | NaN | 1 | 1288.0 | Western Administrative District |
| 6838 | Alternative coffee | coffee shop | Москва, проспект Вернадского, 41, стр. 1 | Западный административный округ | пн-пт 09:00–21:00; сб,вс 09:00–22:00 | 55.673128 | 37.502992 | 4.3 | NaN | NaN | NaN | NaN | 0 | 1288.0 | Western Administrative District |
| 7987 | Ресторан | restaurant | Москва, улица Маршала Захарова, 6, корп. 1 | Южный административный округ | NaN | 55.623680 | 37.704937 | 4.5 | NaN | NaN | NaN | NaN | 0 | 675.0 | Southern Administrative District |
Judging by the fact that there are groups of places with a very high number of seats that are located at the same address, these are really anomalies. We will leave them in the dataset: for further work with the seats, we will use median values.
Also we have "Yandex Lavka" in the dataset, which is a food delivery service. Let's see how many such names are in the data.
# filter out places with the name "Yandex Lavka" or "Yandex.Lavka" and find their number
data.query('name == "Яндекс Лавка" or name == "Яндекс.Лавка"').shape[0]
72
Delete them from the dataset.
data = data.query('name not in ("Яндекс Лавка","Яндекс.Лавка")').copy()
# count the number of missing values in each column of the dataset
data.isna().sum()
name 0 category 0 address 0 district 0 hours 535 lat 0 lng 0 rating 0 price 5019 avg_bill 4518 middle_avg_bill 5185 middle_coffee_cup 7799 chain 0 seats 3574 district_eng 0 dtype: int64
Heat map of missing values in percentage ratio.
pd.DataFrame(round(data.isna().mean()*100,)).style.background_gradient('coolwarm')
| 0 | |
|---|---|
| name | 0.000000 |
| category | 0.000000 |
| address | 0.000000 |
| district | 0.000000 |
| hours | 6.000000 |
| lat | 0.000000 |
| lng | 0.000000 |
| rating | 0.000000 |
| price | 60.000000 |
| avg_bill | 54.000000 |
| middle_avg_bill | 62.000000 |
| middle_coffee_cup | 94.000000 |
| chain | 0.000000 |
| seats | 43.000000 |
| district_eng | 0.000000 |
More than 50% of the missing values in the price, avg_bill, middle_avg_bill columns: the dataset is based on data from Yandex Maps and Yandex Business services, which means that the owners of places or visitors did not add information about prices and the average bill.
The missing values in the middle_avg_bill column are also due to the fact that data from avg_bill does not get into it if they are specified for coffee shops or bars/pubs.
A very high percentage of omissions (94%) in the middle_coffee_cup column is explained by the fact that this column is filled from the avg_bill column if the value starts with the substring "Price of one cup of cappuccino", i.e. mainly for coffee shops, and there are other categories of places in the dataset.
Fill in the missing values as follows:
hours — 6% of missing values, we will leave it as it is;price — we take information about prices in chain places from other places of the chain that are located in the same administrative district; avg_bill with missing values, because for the study we need middle_avg_bill;middle_avg_bill — we fill it out based on the assumption that for places of the same category with the same price level and located in the same administrative district, the average bill will be the same.middle_coffee_cup — we take information about prices in chain coffee shops from other places of the chain that are located in the same administrative district;seats column with missing values, because even places of the same chain can have different areas and different numbers of seats.Fill in the missing values in the price column.
# we sort the chain placws that have a price category specified, and save it to the 'chains' variable
chains = data.query('chain == 1 and price.notna()', engine="python").copy()
# we collect the price categories in the table, broken down by administrative districts and chains
chains = chains.groupby(['district', 'name']).agg({'price': 'first'})
# the function of replacing the missing price value with the value from the 'chains' table
def find_price_category(row):
try:
district = row['district']
name = row['name']
return chains.loc[district, name]['price']
except:
return None
# apply the function to the price column of the 'data' dataset
data.loc[data['price'].isna(), 'price'] = data.apply(find_price_category, axis=1)
Fill in the missing values in the column middle_avg_bill.
# sorting placas for which the price and middle_avg_bill columns are filled in
places = data.query('price.notna() and middle_avg_bill.notna()', engine="python").copy()
# collect the price categories and the average bill in the table, by administrative districts and categories of places
places = places.groupby(['district', 'category', 'price']).agg({'middle_avg_bill': 'median'}).round()
# the function of replacing the missing value middle_avg_bill with the value from the 'places' table
def find_middle_avg_bill(row):
try:
district = row['district']
category = row['category']
price = row['price']
return places.loc[district, category, price]['middle_avg_bill']
except:
return None
# apply the function to the middle_avg_bill column of the 'data' dataset
data.loc[data['middle_avg_bill'].isna(), 'middle_avg_bill'] = data.apply(find_middle_avg_bill, axis=1)
Fill in the missing values in the column middle_coffee_cup.
# sorting the chain coffee shops for which the middle_coffee_cup column is filled
coffee_shops = data.query('chain == 1 and middle_coffee_cup.notna()', engine="python").copy()
# collect the price categories and the average bill in the table, by administrative districts and categories of places
coffee_shops = coffee_shops.groupby(['district', 'name']).agg({'middle_coffee_cup': 'first'})
# function of replacing the missing value middle_coffee_cup with the value from the coffee_shops table
def find_middle_coffee_cup(row):
try:
district = row['district']
name = row['name']
return coffee_shops.loc[district, name]['middle_coffee_cup']
except:
return None
# apply the function to the middle_coffee_cup column of the 'data' dataset
data.loc[data['middle_coffee_cup'].isna(), 'middle_coffee_cup'] = data.apply(find_middle_coffee_cup, axis=1)
Checking what percentage of passes were processed.
# heat maps for the processed dataset 'data' and the original 'df'
display(pd.DataFrame(round(data.isna().mean()*100,)).style.background_gradient('coolwarm'))
display(pd.DataFrame(round(df.isna().mean()*100,)).style.background_gradient('coolwarm'))
| 0 | |
|---|---|
| name | 0.000000 |
| category | 0.000000 |
| address | 0.000000 |
| district | 0.000000 |
| hours | 6.000000 |
| lat | 0.000000 |
| lng | 0.000000 |
| rating | 0.000000 |
| price | 54.000000 |
| avg_bill | 54.000000 |
| middle_avg_bill | 50.000000 |
| middle_coffee_cup | 91.000000 |
| chain | 0.000000 |
| seats | 43.000000 |
| district_eng | 0.000000 |
| 0 | |
|---|---|
| name | 0.000000 |
| category | 0.000000 |
| address | 0.000000 |
| district | 0.000000 |
| hours | 6.000000 |
| lat | 0.000000 |
| lng | 0.000000 |
| rating | 0.000000 |
| price | 61.000000 |
| avg_bill | 55.000000 |
| middle_avg_bill | 63.000000 |
| middle_coffee_cup | 94.000000 |
| chain | 0.000000 |
| seats | 43.000000 |
| district_eng | 0.000000 |
The percentage of missing values in the price column decreased from 61 to 54, in middle_avg_bill from 63 to 50, and in middle_coffee_cup from 94 to 91.
We add the street column with street names from the address column to the dataset, the is_24/7 column with the logical value True if the place is open daily and around the clock, and False if not.
# adding a column with street names
data['street'] = data['address'].apply(lambda x: x.split(',')[1].strip())
# adding a column indicating that the place is open daily and around the clock
data['is_24/7'] = data['hours'].apply(lambda x: True if x == "ежедневно, круглосуточно" else False)
Earlier we found out that there are no obvious duplicates in the dataset. Let's check if there are any implicit ones: we will bring the columns with the address and name to the same register, replace ё with е, remove the special characters, then check for duplicates again.
# names and addresses to lowercase
data['address'] = data['address'].str.lower()
data['name'] = data['name'].str.lower()
# replacing ё with е in names and addresses
data['name'] = data['name'].str.replace('ё', 'е')
data['address'] = data['address'].str.replace('ё', 'е')
# duplicates in the name and address columns
data[data[['name', 'address']].duplicated()]
| name | category | address | district | hours | lat | lng | rating | price | avg_bill | middle_avg_bill | middle_coffee_cup | chain | seats | district_eng | street | is_24/7 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 215 | кафе | cafe | москва, парк ангарские пруды | Северный административный округ | ежедневно, 10:00–22:00 | 55.881438 | 37.531848 | 3.2 | None | NaN | NaN | NaN | 0 | NaN | Northern Administrative District | парк Ангарские пруды | False |
| 1511 | more poke | restaurant | москва, волоколамское шоссе, 11, стр. 2 | Северный административный округ | пн-чт 09:00–18:00; пт,сб 09:00–21:00; вс 09:00... | 55.806307 | 37.497566 | 4.2 | None | NaN | NaN | NaN | 1 | 188.0 | Northern Administrative District | Волоколамское шоссе | False |
| 2420 | раковарня клешни и хвосты | bar, pub | москва, проспект мира, 118 | Северо-Восточный административный округ | пн-чт 12:00–00:00; пт,сб 12:00–01:00; вс 12:00... | 55.810677 | 37.638379 | 4.4 | None | NaN | NaN | NaN | 1 | 150.0 | North-Eastern Administrative District | проспект Мира | False |
| 3109 | хлеб да выпечка | cafe | москва, ярцевская улица, 19 | Западный административный округ | NaN | 55.738449 | 37.410937 | 4.1 | None | NaN | NaN | NaN | 0 | 276.0 | Western Administrative District | Ярцевская улица | False |
There are 4 implicit duplicates in the dataset. Delete them.
data = data.drop_duplicates(['name', 'address'])
One abnormal value was processed: the average bill of 0.00 in one of the coffee shops of the 'Shokoladnitsa' chain was replaced by the median bill in the coffee shops of this chain in the same administrative district. Removed 4 duplicates.
There are a large number of missing values in the dataset. More than 50% of the missing values in the columns price, avg_bill, middle_avg_bill are explained by the fact that the dataset is based on data from Yandex Maps and Yandex Business services, which means that the owners of places or visitors have not added information about prices and the average bill.
The missing values in the middle_avg_bill column are also due to the fact that data from avg_bill does not get into it if they are specified for coffee shops or bars/pubs.
A very high percentage of missing values (94%) in the middle_coffee_cup column is explained by the fact that this column is filled only for coffee shops.
Fill in the missing values as follows:
hours, avg_bill, seats as they are;price we take information about prices in chain places from other places of the chain that are located in the same administrative district;middle_avg_bill column was filled with the median value for places of the same category with the same price level and located in the same administrative district;middle_coffee_cup was filled in for chain coffee shops by analogy with other chain places located in the same administrative district.After processing, the percentage of missing values in the price column decreased from 61 to 54, in middle_avg_bill from 63 to 50, and in middle_coffee_cup from 94 to 91.
Added to the dataset column street with street names, column is_24/7 with the designation that the place is open daily and around the clock.
Let's see which categories of places and how many of them are represented in the data.
# we group the data by categories and count the number of places in each of them,
# save them in the 'categories' variable
categories = (
data.groupby('category') \
.agg(number=('category', 'count')) \
.sort_values(by='number', ascending=False)) \
.reset_index() \
categories
| category | number | |
|---|---|---|
| 0 | cafe | 2376 |
| 1 | restaurant | 1970 |
| 2 | coffee shop | 1413 |
| 3 | bar, pub | 764 |
| 4 | pizzeria | 633 |
| 5 | fast food | 603 |
| 6 | cafeteria | 315 |
| 7 | bakery | 256 |
# barplot with the number of places by category
plt.style.use('seaborn')
plt.figure(figsize=(12,7))
plt.title('Number of foodservice places by category',fontsize=15)
g = sns.barplot(data=categories, x='category', y='number')
for p in g.patches:
g.annotate(round(p.get_height()), xy=(p.get_x() + p.get_width() / 2, p.get_height()), ha = 'center', va = 'center', xytext = (0, 10), textcoords = 'offset points')
plt.xlabel('Categories of places',fontsize=12)
plt.ylabel('Number',fontsize=12);
Cafes and restaurants are represented the most in the dataset — 2,378 and 1,971 places, respectively. The least number of bakeries, 256.
# we group the data by categories and calculate the number of seats in each, save it to the 'seats' variable
seats = (
data.groupby('category') \
.agg(number=('seats', 'median')) \
.sort_values(by='number', ascending=False)) \
.reset_index()
seats
| category | number | |
|---|---|---|
| 0 | restaurant | 89.0 |
| 1 | bar, pub | 82.0 |
| 2 | coffee shop | 80.0 |
| 3 | cafeteria | 75.5 |
| 4 | fast food | 65.0 |
| 5 | cafe | 60.0 |
| 6 | pizzeria | 55.0 |
| 7 | bakery | 50.0 |
# barplot with the number of seats by category
plt.figure(figsize=(12,8))
plt.title('Number of seats in foodservice places',fontsize=15)
g = sns.barplot(data=seats, x='category', y='number')
for p in g.patches:
g.annotate(round(p.get_height()), xy=(p.get_x() + p.get_width() / 2, p.get_height()), ha = 'center', va = 'center', xytext = (0, 10), textcoords = 'offset points')
plt.xlabel('Categories of places',fontsize=12)
plt.ylabel('Number of seats',fontsize=12);
Box-and-whisker diagram.
plt.figure(figsize=(12,8))
plt.title('Number of seats in foodservice places',fontsize=15)
g = sns.boxplot(data=data, x='category', y='seats')
plt.ylim(-10, 1000)
plt.xlabel('Categories of places',fontsize=12)
plt.ylabel('Number of seats',fontsize=12);
Restaurants, bars/pubs and coffee shops are in the lead in terms of the number of seats: the median number of seats is 90, 82 and 80, respectively. On the boxplot, we see outliers: in each category there are places, the number of seats in which significantly exceeds the median.
# we group the data according to the chain/non-chain place criterion, calculate the amount,
# save it to the 'is_chains' variable
is_chains = data.groupby('chain').agg(number=('chain', 'count')).reset_index()
is_chains.loc[0, 'chain'] = "Non-chain"
is_chains.loc[1, 'chain'] = "Chain"
is_chains
| chain | number | |
|---|---|---|
| 0 | Non-chain | 5199 |
| 1 | Chain | 3131 |
# pie chart with the ratio of chain and non-chain places
number = is_chains['number']
lables = is_chains['chain']
plt.pie(number, labels=lables, autopct="%1.1f%%", normalize=True)
plt.title("The ratio of chain and non-chain places")
plt.show();
There are 63.2% of non—chain places in the dataset, 36.8% of chain places.
# we group the data by category and format of the place, calculate the amount,
# save it to the 'chains_categories' variable
chains_categories = data.groupby(['category', 'chain']).agg(number=('chain', 'count')).reset_index()
chains_categories.loc[chains_categories['chain'] == 0, 'chain'] = 'no'
chains_categories.loc[chains_categories['chain'] == 1, 'chain'] = 'yes'
for index, row in chains_categories.iterrows():
chains_categories.loc[chains_categories['category'] == row[0],'total'] = data.query('category == @row[0]')['category'].count()
chains_categories['percent'] = round(chains_categories['number'] / chains_categories['total'] * 100, 1)
chains_categories
| category | chain | number | total | percent | |
|---|---|---|---|---|---|
| 0 | bakery | no | 99 | 256.0 | 38.7 |
| 1 | bakery | yes | 157 | 256.0 | 61.3 |
| 2 | bar, pub | no | 596 | 764.0 | 78.0 |
| 3 | bar, pub | yes | 168 | 764.0 | 22.0 |
| 4 | cafe | no | 1597 | 2376.0 | 67.2 |
| 5 | cafe | yes | 779 | 2376.0 | 32.8 |
| 6 | cafeteria | no | 227 | 315.0 | 72.1 |
| 7 | cafeteria | yes | 88 | 315.0 | 27.9 |
| 8 | coffee shop | no | 693 | 1413.0 | 49.0 |
| 9 | coffee shop | yes | 720 | 1413.0 | 51.0 |
| 10 | fast food | no | 371 | 603.0 | 61.5 |
| 11 | fast food | yes | 232 | 603.0 | 38.5 |
| 12 | pizzeria | no | 303 | 633.0 | 47.9 |
| 13 | pizzeria | yes | 330 | 633.0 | 52.1 |
| 14 | restaurant | no | 1313 | 1970.0 | 66.6 |
| 15 | restaurant | yes | 657 | 1970.0 | 33.4 |
# graph with the share of chain and non-chain places in each category
fig = px.bar(chains_categories,
x="percent",
y="category",
color="chain",
orientation='h', width=1000, height=500,
color_discrete_sequence=px.colors.qualitative.T10,
text = chains_categories['percent'].map("{:,}%".format),
#.map("{:,}%".format),
labels={"percent": "Percentage of places",
"category": "Categories",
"chain": "Chain"},
title="Percentage of chain and non-chain places by category")\
.update_yaxes(categoryorder="max ascending")
fig.show()
As we can see on the graph, there are three categories of places that are more often chains: coffee shops, pizzerias and bakeries.
We will consider as a chain those places for which the value 1 is indicated in the chain column, which have the same name and the same category. Popularity will be assessed by the number of places in the chain.
# we sort chain places, group them by name, count the number of places
# and the number of categories so that non-chain places with the same names do not fall into the group
# save it to the 'top_15' variable
top_15 = (data.query('chain == 1')\
.groupby('name') \
.agg(number=('name', 'count'), category=('category', 'first'), count_category=('category', 'nunique')) \
.sort_values(by=['count_category', 'number'], ascending=[True, False]) \
.reset_index() \
.head(15))
top_15
| name | number | category | count_category | |
|---|---|---|---|---|
| 0 | домино'с пицца | 76 | pizzeria | 1 |
| 1 | додо пицца | 74 | pizzeria | 1 |
| 2 | one price coffee | 71 | coffee shop | 1 |
| 3 | cofix | 65 | coffee shop | 1 |
| 4 | кофепорт | 42 | coffee shop | 1 |
| 5 | кулинарная лавка братьев караваевых | 39 | cafe | 1 |
| 6 | drive café | 24 | cafe | 1 |
| 7 | cinnabon | 20 | coffee shop | 1 |
| 8 | штолле | 19 | bakery | 1 |
| 9 | арамье | 18 | bakery | 1 |
| 10 | vasilchukí chaihona №1 | 17 | restaurant | 1 |
| 11 | сушистор | 16 | restaurant | 1 |
| 12 | моремания | 15 | restaurant | 1 |
| 13 | pizza express 24 | 14 | pizzeria | 1 |
| 14 | бургер кинг | 14 | restaurant | 1 |
We check how the top 15 is distributed by category.
# group the top 15 by category, count the number
top_15.groupby('category').agg(number=('category', 'count')).sort_values(by='number', ascending=False).reset_index()
| category | number | |
|---|---|---|
| 0 | coffee shop | 4 |
| 1 | restaurant | 4 |
| 2 | pizzeria | 3 |
| 3 | bakery | 2 |
| 4 | cafe | 2 |
# barplot with the name of the chains, the number of places and the rating
plt.figure(figsize=(14,7))
plt.title('Top 15 popular chains',fontsize=15)
g = sns.barplot(data=top_15, x='number', y='name', orient='h')
for p in g.patches:
width, height = p.get_width(), p.get_height()
x, y = p.get_xy()
g.text(x+width - 2,
y+height/2,
'{:.0f}'.format(width),
horizontalalignment='left',
verticalalignment='center')
plt.xlabel('Number of places',fontsize=12)
plt.ylabel('Names',fontsize=12);
Domino's Pizza and Dodo Pizza chains are leading among the popular chains. The top 15 chains do not include pubs/bars, fast food places, canteens.
Let's take a look at the total number of places from the top 15 and the number of places of each category by district.
# save the names of places from the top 15 chains to the 'top_names' list
top_names = list(top_15['name'].unique())
# filter out places from the top 15 chains and save them to the 'top_15_district' variable
top_15 = data.query('name in @top_names')
# we group the top 15 chains places by district and category, calculate the number
top_15_district = top_15.groupby(['district_eng', 'category']).agg(number=('name', 'count')).reset_index()
top_15_district['district_eng'] = top_15_district['district_eng'].apply(lambda x: x.split(' ')[0])
# interactive chart with the distribution of the top 15 chains places by districts of Moscow
fig = px.bar(top_15_district,
x="number",
y="district_eng",
color="category",
orientation='h', width=1000, height=500,
color_discrete_sequence=px.colors.qualitative.T10,
labels={
"district_eng": "Moscow District",
"number": "Number of places",
"category": "Categories"
},
title="Distribution of the top 15 chains categories by districts of Moscow")\
.update_yaxes(categoryorder="total ascending")
fig.show()
It is expected that most of the places from the top 15 are in the Central Administrative District of Moscow. Least of all, 33, in the North-West.
Which administrative divisions of Moscow are present in the dataset?
# list of administrative districts of Moscow from the dataset
for x in list(data['district_eng'].unique()):
print(x)
Northern Administrative District North-Eastern Administrative District North-Western Administrative District Western Administrative District Central Administrative District Eastern Administrative District South-Eastern Administrative District Southern Administrative District South-Western Administrative District
In total, 9 administrative districts of Moscow are represented in the dataset. Let's look at the total number of places and the number of places of each category in these districts.
# we collect in the 'districts' pivot table the number of places of each category,
# divided by administrative districts
districts = data.pivot_table(index='district_eng', columns = 'category', values='name', aggfunc='count').reset_index()
districts['total'] = districts.sum(axis=1, numeric_only=True)
districts['district_eng'] = districts['district_eng'].apply(lambda x: x.split(' ')[0].strip())
districts = districts.sort_values(by='total')
districts;
# for the graph, we group the number of places of each category by administrative districts
# in the 'districts2' variable
districts2 = data.groupby(['district_eng', 'category']).agg(number=('category', 'count')).reset_index()
districts2['district_eng'] = districts2['district_eng'].apply(lambda x: x.split(' ')[0])
# interactive chart with the distribution of places of all categories by districts of Moscow
fig = px.bar(districts2,
x="number",
y="district_eng",
color="category",
orientation='h', width=1000, height=500,
color_discrete_sequence=px.colors.qualitative.T10,
labels={
"district_eng": "Moscow District",
"number": "Number of places",
"category": "Categories"
},
title="Distribution of categories of all Moscow foodservice places by districts")\
.update_yaxes(categoryorder="total ascending")
fig.show()
The Central Administrative District of Moscow is also the leader in the total number of places.
Let's look at the boxplot.
plt.figure(figsize=(12,8))
plt.title('Ratings by category of places',fontsize=15)
g = sns.boxplot(data=data, x='category', y='rating')
#plt.ylim(-10, 1000)
plt.xlabel('Categories',fontsize=12)
plt.ylabel('Rating',fontsize=12);
Almost all categories of places have outliers towards low ratings.
# we group the data by categories and calculate the average rating of each, save it to the 'categor_rating' variable
categ_rating = (data.groupby('category') \
.agg(mean_rating=('rating', 'mean')) \
.reset_index() \
.sort_values(by='mean_rating', ascending=False))
categ_rating
| category | mean_rating | |
|---|---|---|
| 1 | bar, pub | 4.387696 |
| 7 | restaurant | 4.306294 |
| 6 | pizzeria | 4.301264 |
| 4 | coffee shop | 4.277282 |
| 0 | bakery | 4.268359 |
| 3 | cafeteria | 4.211429 |
| 2 | cafe | 4.124285 |
| 5 | fast food | 4.050249 |
# distribution of average ratings by categories of places
plt.figure(figsize=(12,6))
plt.title('Distribution of average ratings by categories of places',fontsize=15)
g = sns.barplot(data=categ_rating, x='category', y='mean_rating')
for p in g.patches:
g.annotate('{:.2f}'.format(p.get_height()), xy=(p.get_x() + p.get_width() / 2, p.get_height()), ha = 'center', va = 'center', xytext = (0, 10), textcoords = 'offset points')
plt.xlabel('Categories',fontsize=12)
plt.ylabel('Average rating',fontsize=12)
g.set_ylim(0, 5);
The average rating of all categories exceeds 4 points. Bars/pubs have a maximum rating of 4.39, fast food places have a minimum rating of 4.05.
We calculate the average rating of places in each administrative district of Moscow.
# we group the data by districts and calculate the average rating, save it in the 'district_rating' variable
district_rating = data.groupby('district', as_index=False)['rating'].agg('mean')
district_rating.sort_values(by='rating', ascending=False)
| district | rating | |
|---|---|---|
| 5 | Центральный административный округ | 4.378543 |
| 2 | Северный административный округ | 4.241275 |
| 4 | Северо-Западный административный округ | 4.209606 |
| 8 | Южный административный округ | 4.187614 |
| 1 | Западный административный округ | 4.187262 |
| 0 | Восточный административный округ | 4.179389 |
| 7 | Юго-Западный административный округ | 4.177937 |
| 3 | Северо-Восточный административный округ | 4.150396 |
| 6 | Юго-Восточный административный округ | 4.106232 |
We build a choropleth map with an average rating of places in each district.
# uploading a JSON file with the borders of Moscow districts
state_geo = 'admin_level_geomap.geojson'
# moscow_lat - latitude of the center of Moscow, moscow_lng - longitude of the center of Moscow
moscow_lat, moscow_lng = 55.751244, 37.618423
# creating a map of Moscow
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
# creating a choropleth map using the Choropleth function and adding it to the map
Choropleth(
geo_data=state_geo,
data=district_rating,
columns=['district', 'rating'],
key_on='feature.name',
fill_color='RdPu',
fill_opacity=0.8,
legend_name='Average rating of foodservice places by administrative districts of Moscow',
).add_to(m)
# displaying the map
m
Places in the Central Administrative District of Moscow have the highest average rating (4.38), the minimum in the South-Eastern (4.1).
All places of the dataset displayed on the map using clusters.
# moscow_lat - latitude of the center of Moscow, moscow_lng - longitude of the center of Moscow
moscow_lat, moscow_lng = 55.751244, 37.618423
# creating Moscow map
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
# creating an empty cluster, adding it to the map
marker_cluster = MarkerCluster().add_to(m)
# a function that takes a dataframe row,
# creates a marker at the current point and adds it to the 'marker_cluster' cluster
def create_clusters(row):
Marker(
[row['lat'], row['lng']],
popup=f"{row['name']} {row['rating']}",
).add_to(marker_cluster)
# applying the create_clusters() function to each row of the dataframe
data.apply(create_clusters, axis=1)
# displaying the map
m
There are noticeably more places in the center, north and west of Moscow than in the south and east.
# we group places by streets and count the number, save the first 15 in the 'top_streets' variable
top_streets = (data.groupby('street') \
.agg(number=('name', 'count')) \
.sort_values(by='number', ascending=False) \
.reset_index() \
.head(15))
top_streets
| street | number | |
|---|---|---|
| 0 | проспект Мира | 183 |
| 1 | Профсоюзная улица | 121 |
| 2 | проспект Вернадского | 106 |
| 3 | Ленинский проспект | 106 |
| 4 | Ленинградский проспект | 95 |
| 5 | Дмитровское шоссе | 87 |
| 6 | Каширское шоссе | 77 |
| 7 | Варшавское шоссе | 74 |
| 8 | Ленинградское шоссе | 70 |
| 9 | МКАД | 65 |
| 10 | Люблинская улица | 60 |
| 11 | улица Вавилова | 55 |
| 12 | Кутузовский проспект | 54 |
| 13 | улица Миклухо-Маклая | 48 |
| 14 | Пятницкая улица | 48 |
Mira Avenue is the leader in terms of the number of places in Moscow, 184 places are located on this street.
Let's plot the distribution of the number of places and their categories by the top 15 streets.
# creating a 'streets' list with the names of the top 15 streets
streets = list(top_streets['street'])
# collecting places in a table by top-15 streets and categories, calculating the number,
# saving it to the 'top_streets_categorie' variable
top_streets_categories = data.query('street in @streets').pivot_table(index='street', columns = 'category', values='name', aggfunc='count').reset_index()
top_streets_categories['total'] = top_streets_categories.sum(axis=1, numeric_only=True)
top_streets_categories = top_streets_categories.fillna(0)
top_streets_categories.loc[:, 'bar, pub': 'total'] = top_streets_categories.loc[:, 'bar, pub': 'total'].astype('int')
top_streets_categories = top_streets_categories.sort_values(by='total')
top_streets_categories
| category | street | bakery | bar, pub | cafe | cafeteria | coffee shop | fast food | pizzeria | restaurant | total |
|---|---|---|---|---|---|---|---|---|---|---|
| 10 | Пятницкая улица | 3.0 | 9 | 7 | 0 | 6 | 2 | 3 | 18 | 48 |
| 14 | улица Миклухо-Маклая | 0.0 | 3 | 21 | 0 | 4 | 4 | 2 | 14 | 48 |
| 3 | Кутузовский проспект | 1.0 | 2 | 14 | 3 | 13 | 2 | 3 | 16 | 54 |
| 13 | улица Вавилова | 2.0 | 2 | 15 | 0 | 10 | 11 | 3 | 12 | 55 |
| 7 | Люблинская улица | 0.0 | 5 | 26 | 2 | 11 | 5 | 1 | 10 | 60 |
| 8 | МКАД | 0.0 | 1 | 45 | 1 | 4 | 9 | 0 | 5 | 65 |
| 5 | Ленинградское шоссе | 2.0 | 5 | 13 | 3 | 13 | 5 | 3 | 26 | 70 |
| 0 | Варшавское шоссе | 0.0 | 6 | 18 | 7 | 14 | 7 | 4 | 18 | 74 |
| 2 | Каширское шоссе | 0.0 | 2 | 20 | 5 | 16 | 10 | 5 | 19 | 77 |
| 1 | Дмитровское шоссе | 2.0 | 6 | 23 | 4 | 11 | 10 | 8 | 23 | 87 |
| 4 | Ленинградский проспект | 4.0 | 15 | 12 | 3 | 25 | 2 | 9 | 25 | 95 |
| 6 | Ленинский проспект | 3.0 | 10 | 26 | 5 | 23 | 2 | 5 | 32 | 106 |
| 11 | проспект Вернадского | 1.0 | 7 | 25 | 2 | 16 | 12 | 12 | 31 | 106 |
| 9 | Профсоюзная улица | 4.0 | 6 | 35 | 3 | 18 | 15 | 15 | 25 | 121 |
| 12 | проспект Мира | 4.0 | 11 | 53 | 2 | 36 | 21 | 11 | 45 | 183 |
# for the graph, we group by top-15 streets and categories, calculate the number, save it to the 'top_streets_categories2' variable
top_streets_categories2 = data.query('street in @streets').groupby(['street', 'category']).agg(number=('name', 'count')).reset_index()
# interactive chart with the distribution of places of all categories in the top 15 streets of Moscow
fig = px.bar(top_streets_categories2,
x="number",
y="street",
color="category",
orientation='h', width=1000, height=500,
color_discrete_sequence=px.colors.qualitative.T10,
labels={
"street": "Street",
"number": "Number of places",
"category": "Categories"
},
title="Distribution of categories of places by top-15 streets") \
.update_yaxes(categoryorder="total ascending")
fig.show()
The most numerous categories of places on the top 15 streets are cafes, coffee shops and restaurants.
We will find streets where there is only one foodservice place, and see what kind of places they are.
# in the 'streets_with_one' variable, data grouped by streets and number of places
# we filter the streets with one place and display the number of rows of this dataset
streets_with_one = (data.groupby('street') \
.agg(number=('name', 'count'), category=('category', 'first')) \
.sort_values(by='number', ascending=False) \
.query('number == 1') \
.reset_index())
streets_with_one.shape[0]
459
# grouping streets with one place from the 'streets_with_one' variable by categories
# counting the number of places in each category
streets_with_one.groupby('category', as_index=False)['street'].agg('count').sort_values(by='street', ascending=False)
| category | street | |
|---|---|---|
| 2 | cafe | 161 |
| 7 | restaurant | 89 |
| 4 | coffee shop | 85 |
| 1 | bar, pub | 39 |
| 3 | cafeteria | 36 |
| 5 | fast food | 23 |
| 6 | pizzeria | 18 |
| 0 | bakery | 8 |
There are a total of 458 streets in Moscow, where there is only one foodservice place, most often it will be a cafe.
We calculate the median of the column with the value of the average check middle_avg_bill for each district and use this value as a price indicator of the area.
# we group the data by administrative district, calculate the median average bill,
# save it to the 'district_avg_bill' variable
district_avg_bill = data.groupby('district', as_index=False)['middle_avg_bill'].agg('median')
district_avg_bill.sort_values(by='middle_avg_bill', ascending=False)
| district | middle_avg_bill | |
|---|---|---|
| 5 | Центральный административный округ | 950.0 |
| 4 | Северо-Западный административный округ | 675.0 |
| 1 | Западный административный округ | 650.0 |
| 8 | Южный административный округ | 550.0 |
| 0 | Восточный административный округ | 525.0 |
| 7 | Юго-Западный административный округ | 525.0 |
| 2 | Северный административный округ | 500.0 |
| 3 | Северо-Восточный административный округ | 500.0 |
| 6 | Юго-Восточный административный округ | 425.0 |
Building a choroplet with the values obtained for each district.
# moscow_lat - latitude of the center of Moscow, moscow_lng - longitude of the center of Moscow
moscow_lat, moscow_lng = 55.751244, 37.618423
# creating Moscow map
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
# creating a choropleth map using the Choropleth function and adding it to the map
Choropleth(
geo_data=state_geo,
data=district_avg_bill,
columns=['district', 'middle_avg_bill'],
key_on='feature.name',
fill_color='RdPu',
fill_opacity=0.8,
legend_name='Median average bill in places by administrative districts of Moscow',
).add_to(m)
# displaying the map
m
The median average bill in the Central Administrative District (950 rubles) is significantly higher than in other districts. The cheapest places are located in the South-Eastern Administrative District (the average bill is 425 rubles).
We study the opening hours of places and their dependence on the location and category of the places.
# group the data by category and opening hours, count the number of places
is_day_night = data.groupby(['category','is_24/7']).agg(number=('is_24/7', 'count')).reset_index()
is_day_night = is_day_night.rename(columns={'is_24/7': 'all_around'})
is_day_night.loc[is_day_night['all_around'] == False, 'all_around'] = 'no'
is_day_night.loc[is_day_night['all_around'] == True, 'all_around'] = 'yes'
# we count the total number of places in each category
# and the percentage of places with round-the-clock and opening hours
for index, row in is_day_night.iterrows():
is_day_night.loc[is_day_night['category'] == row[0],'total'] = data.query('category == @row[0]')['category'].count()
is_day_night['percent'] = round(is_day_night['number'] / is_day_night['total'] * 100, 1)
is_day_night
| category | all_around | number | total | percent | |
|---|---|---|---|---|---|
| 0 | bakery | no | 232 | 256.0 | 90.6 |
| 1 | bakery | yes | 24 | 256.0 | 9.4 |
| 2 | bar, pub | no | 712 | 764.0 | 93.2 |
| 3 | bar, pub | yes | 52 | 764.0 | 6.8 |
| 4 | cafe | no | 2109 | 2376.0 | 88.8 |
| 5 | cafe | yes | 267 | 2376.0 | 11.2 |
| 6 | cafeteria | no | 303 | 315.0 | 96.2 |
| 7 | cafeteria | yes | 12 | 315.0 | 3.8 |
| 8 | coffee shop | no | 1354 | 1413.0 | 95.8 |
| 9 | coffee shop | yes | 59 | 1413.0 | 4.2 |
| 10 | fast food | no | 453 | 603.0 | 75.1 |
| 11 | fast food | yes | 150 | 603.0 | 24.9 |
| 12 | pizzeria | no | 602 | 633.0 | 95.1 |
| 13 | pizzeria | yes | 31 | 633.0 | 4.9 |
| 14 | restaurant | no | 1843 | 1970.0 | 93.6 |
| 15 | restaurant | yes | 127 | 1970.0 | 6.4 |
# a graph with the share of round-the-clock and regular places in each category
fig = px.bar(is_day_night,
x="percent",
y="category",
color="all_around",
orientation='h', width=1000, height=500,
color_discrete_sequence=px.colors.qualitative.T10,
text = is_day_night['percent'].map("{:,}%".format),
labels={"percent": "Percentage of places",
"category": "Categories",
"all_around": "24/7"},
title="Percentage of round-the-clock and regular places by category")\
.update_yaxes(categoryorder="min ascending")
fig.show()
It is expected that the highest percentage of round-the-clock places are in the "fast food" category, canteens have the lowest percentage.
Let's see in which districts of Moscow there are more round-the-clock places.
# group the data by district and count the number of round-the-clock places, save it to the 'district_day_night' variable
district_day_night = data.groupby('district', as_index=False)['is_24/7'].agg('sum')
district_day_night
| district | is_24/7 | |
|---|---|---|
| 0 | Восточный административный округ | 95 |
| 1 | Западный административный округ | 71 |
| 2 | Северный административный округ | 71 |
| 3 | Северо-Восточный административный округ | 74 |
| 4 | Северо-Западный административный округ | 43 |
| 5 | Центральный административный округ | 131 |
| 6 | Юго-Восточный административный округ | 92 |
| 7 | Юго-Западный административный округ | 72 |
| 8 | Южный административный округ | 73 |
# moscow_lat - latitude of the center of Moscow, moscow_lng - longitude of the center of Moscow
moscow_lat, moscow_lng = 55.751244, 37.618423
# creating Moscow map
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
# creating a choropleth map using the Choropleth function and adding it to the map
Choropleth(
geo_data=state_geo,
data=district_day_night,
columns=['district', 'is_24/7'],
key_on='feature.name',
fill_color='RdPu',
fill_opacity=0.8,
legend_name='The number of round-the-clock places in the administrative districts of Moscow',
).add_to(m)
# displaying the map
m
The maximum number of round-the-clock places in the Central Administrative District, on the choroplete it is also visible that there are more round-the-clock places in the east of Moscow than in the west.
For our study, we will consider a rating below 4 points to be bad. Let's see what percentage of places with a low rating.
# we filter places with a rating below 4 and calculate their percentage
print("Percentage of places with a low rating:", round(data.query('rating < 4')['name'].count() / data['name'].count() * 100))
Percentage of places with a low rating: 14
We look at how low-rated places are distributed by category.
# we group places with a low rating by categories and count their number, save them in 'low_rating_places'
low_rating_places = data.query('rating < 4').groupby('category', as_index=False).agg(low_rating_number=('name', 'count'))
# we group establishments by categories and count their number, save them in 'all_places'
all_places = data.groupby('category', as_index=False).agg(all_places=('name', 'count'))
# we merge two tables, calculate the share of places with a low rating in each category
low_raiting_percent = low_rating_places.merge(all_places, on='category')
low_raiting_percent['share'] = round(low_raiting_percent['low_rating_number'] / low_raiting_percent['all_places'] * 100)
low_raiting_percent
| category | low_rating_number | all_places | share | |
|---|---|---|---|---|
| 0 | bakery | 30 | 256 | 12.0 |
| 1 | bar, pub | 40 | 764 | 5.0 |
| 2 | cafe | 542 | 2376 | 23.0 |
| 3 | cafeteria | 44 | 315 | 14.0 |
| 4 | coffee shop | 138 | 1413 | 10.0 |
| 5 | fast food | 162 | 603 | 27.0 |
| 6 | pizzeria | 35 | 633 | 6.0 |
| 7 | restaurant | 153 | 1970 | 8.0 |
Fast food places (27%) and cafes (23%) have the highest share of places with a rating below 4 points.
The following can be said about foodservice places in Moscow:
Cafes and restaurants are represented the most in the dataset — 2,378 and 1,971 places, respectively. The least number have bakeries (256).
Restaurants, bars/pubs and coffee shops are leading in terms of the number of seats: the median number of seats is 90, 82 and 80, respectively.
Non—chain places in the dataset 63.2%, chain - 36.8%. More often they are chain coffee shops, pizzerias and bakeries.
Domino's Pizza and Dodo Pizza pizza chains are leading among the popular chains. The top 15 chains do not include pubs/bars, fast food places, canteens.
Most of the places from the top 15 chains are in the Central Administrative District of Moscow (11). Least of all — one — in the East. The Central Administrative District of Moscow is also the leader in the total number of places — 2,237.
The average rating of all categories of places exceeds 4 points. The maximum rating of bars/pubs is 4.39, fast food places have the minimum rating (4.05). The maximum average rating of places is in the Central Administrative District (4.38), the minimum in the South—East (4.1).
Mira Avenue is the leader in terms of the number of places in Moscow, 184 places are located on this street. The most numerous categories of places on the top 15 streets: cafes, coffee shops and restaurants. There are 458 streets where there is only one foodservice place in Moscow. Most often the only place will be a cafe.
The median average bill is in the Central Administrative District (950 rubles) is significantly higher than in other districts. The cheapest places are located in the South-Eastern Administrative District (the average bill is 425 rubles.)
The highest percentage of round-the-clock places is in the fast food category, the lowest in canteens. The maximum number of round-the-clock places in the Central Administrative District (131), there are more round-the-clock places in the east of Moscow than in the west.
Fast food places (27%) and cafes (23%) have the highest share of places with a rating below 4 points.
We recommend paying attention to such categories of places as coffee shops and pizzerias: there are fewer of these places than cafes and restaurants – the competition is lower. On the other hand, they are not as limited in format as bars and pubs, which are not suitable, for example, for family holidays.
The Central Administrative District of Moscow is the most saturated with foodservice place – 2,237 establishments. In the city center, the audience is much higher than just the number of residents, due to offices, cultural institutions, etc. To open a new place, we recommend considering the Northern, Northeastern and Southern districts: there is less competition compared to the Central Administrative District, but at the same time there are enough places in these areas, so there is an audience.
As for the average bill, we recommend focusing on the median average bill of places depending on the district. In the Northern, North-Eastern and Southern districts, the median bill is 650 rubles, 500 rubles and 500 rubles, respectively.
Investors are planning to open a coffee shop in Moscow. Let's try to give a recommendation for opening a new place.
Let's see how many coffee shops there are in the dataset, in which districts of Moscow there are most of them, what are the features of their location.
print("Total coffee shops in the dataset:", data.query('category == "coffee shop"').shape[0])
Total coffee shops in the dataset: 1413
# we group the filtered coffee shops by district and count the number
district_coffee_shops = data.query('category == "coffee shop"').groupby('district_eng', as_index=False).agg(number=('name', 'count'))
district_coffee_shops['district_eng'] = district_coffee_shops['district_eng'].apply(lambda x: x.split(' ')[0])
# interactive graph of coffee shops distribution by Moscow districts
fig = px.bar(district_coffee_shops,
x="number",
y="district_eng",
orientation='h', width=1000, height=500,
color_discrete_sequence=px.colors.qualitative.T10,
labels={
"district_eng": "Moscow district",
"number": "Number of coffee shops"
},
title="Distribution of Moscow coffee shops by districts")\
.update_yaxes(categoryorder="total ascending")
fig.show()
Сlusters of coffee shops on the map of Moscow.
# moscow_lat - latitude of the center of Moscow, moscow_lng - longitude of the center of Moscow
moscow_lat, moscow_lng = 55.751244, 37.618423
# creating Moscow map
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
# creating an empty cluster, adding it to the map
marker_cluster = MarkerCluster().add_to(m)
# applying the create_clusters() function to each row of the dataframe with the 'coffee shop' category
data.query('category == "coffee shop"').apply(create_clusters, axis=1)
# displaying the map
m
As we know from the study of the distribution of places by districts of Moscow, in the Southern Administrative District, the number of places of which is comparable to the Northern and North-Eastern District, there are relatively fewer coffee shops. We can see this on the map as well.
From the point of view of the share of coffee shops among all the places of the district, one can consider the Southern Administrative District of Moscow to open a coffee shop.
Previously, we studied the opening hours of foodservice places and know that only 4.2% of all coffee shops in Moscow work 24/7.
Let's see how many round-the-clock coffee shops there are and how they are distributed by districts.
# we group coffee shops by district and count the number of round-the-clock ones,
# save them to the 'district_is_24_7' variable
district_is_24_7 = data.query('category == "coffee shop"').groupby('district_eng').agg(number=('is_24/7', 'sum')).sort_values(by='number', ascending=False).reset_index()
district_is_24_7 = district_is_24_7.rename(columns={'district_eng': 'District', 'number': 'Number'})
district_is_24_7
| District | Number | |
|---|---|---|
| 0 | Central Administrative District | 26 |
| 1 | Western Administrative District | 9 |
| 2 | South-Western Administrative District | 7 |
| 3 | Eastern Administrative District | 5 |
| 4 | Northern Administrative District | 5 |
| 5 | North-Eastern Administrative District | 3 |
| 6 | North-Western Administrative District | 2 |
| 7 | South-Eastern Administrative District | 1 |
| 8 | Southern Administrative District | 1 |
There is only one round-the-clock coffee shop in the South-Eastern and Southern districts of Moscow. The largest number of 24—hour coffee shops is in the city center (26).
Previously, we studied the distribution of average ratings by categories of foodservice places and we know that the average rating of coffee shops is 4.28.
Let's see how the ratings of coffee shops are distributed by districts.
districts_coffee_shops_rating = data.query('category == "coffee shop"').groupby('district').agg(mean_rating=('rating', 'mean')).sort_values(by='mean_rating').reset_index()
districts_coffee_shops_rating['mean_rating'] = districts_coffee_shops_rating['mean_rating'].round(2)
districts_coffee_shops_rating
| district | mean_rating | |
|---|---|---|
| 0 | Западный административный округ | 4.20 |
| 1 | Северо-Восточный административный округ | 4.22 |
| 2 | Юго-Восточный административный округ | 4.23 |
| 3 | Южный административный округ | 4.23 |
| 4 | Восточный административный округ | 4.28 |
| 5 | Юго-Западный административный округ | 4.28 |
| 6 | Северный административный округ | 4.29 |
| 7 | Северо-Западный административный округ | 4.33 |
| 8 | Центральный административный округ | 4.34 |
In Southern District, the average rating of coffee shops is slightly lower than in Moscow as a whole.
# moscow_lat - latitude of the center of Moscow, moscow_lng - longitude of the center of Moscow
moscow_lat, moscow_lng = 55.751244, 37.618423
# creating Moscow map
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
# creating a choropleth map using the Choropleth function and adding it to the map
Choropleth(
geo_data=state_geo,
data=districts_coffee_shops_rating,
columns=['district', 'mean_rating'],
key_on='feature.name',
fill_color='RdPu',
fill_opacity=0.8,
legend_name='Average rating of coffee shops by administrative districts of Moscow',
).add_to(m)
# displaying the map
m
The coffee shops in the Central district and the North-Western district have the highest average rating.
We analyze what the cost of a cup of cappuccino is worth focusing on when opening a coffee shop. We look at the average price of a cup of cappuccino in coffee shops in Moscow districts.
# we group coffee shops by administrative districts and calculate the average cost of a cup of cappuccino
districts_coffee_shops_cup_price = data.query('category == "coffee shop"').groupby('district').agg(middle_coffee_cup=('middle_coffee_cup', 'mean')).sort_values(by='middle_coffee_cup', ascending=False).reset_index()
districts_coffee_shops_cup_price['middle_coffee_cup'] = districts_coffee_shops_cup_price['middle_coffee_cup'].round()
districts_coffee_shops_cup_price
| district | middle_coffee_cup | |
|---|---|---|
| 0 | Центральный административный округ | 187.0 |
| 1 | Юго-Западный административный округ | 187.0 |
| 2 | Западный административный округ | 170.0 |
| 3 | Северо-Восточный административный округ | 168.0 |
| 4 | Южный административный округ | 165.0 |
| 5 | Северо-Западный административный округ | 164.0 |
| 6 | Северный административный округ | 158.0 |
| 7 | Восточный административный округ | 147.0 |
| 8 | Юго-Восточный административный округ | 137.0 |
# moscow_lat - latitude of the center of Moscow, moscow_lng - longitude of the center of Moscow
moscow_lat, moscow_lng = 55.751244, 37.618423
# creating Moscow map
m = Map(location=[moscow_lat, moscow_lng], zoom_start=10)
# creating a choropleth map using the Choropleth function and adding it to the map
Choropleth(
geo_data=state_geo,
data=districts_coffee_shops_cup_price,
columns=['district', 'middle_coffee_cup'],
key_on='feature.name',
fill_color='RdPu',
fill_opacity=0.8,
legend_name='The average cost of a cup of cappuccino in Moscow districts',
).add_to(m)
# displaying the map
m
The most expensive cappuccino in the coffee shops of the Central and Southwestern administrative districts — an average of 187 rubles per cup.
Earlier, we also studied the distribution of chain and non-chain places by category and know that in Moscow the number of chain and non-chain coffee shops is approximately equal, unlike cafes, restaurants and bars, where there are significantly more non-chain establishments.
The average rating of chain and non-chain coffee shops in Moscow districts.
districts_coffee_shops_chain_rating = data.query('category == "coffee shop"').groupby(['district_eng', 'chain']).agg(rating=('rating', 'mean')).reset_index()
districts_coffee_shops_chain_rating['rating'] = districts_coffee_shops_chain_rating['rating'].round(2)
districts_coffee_shops_chain_rating['district_eng'] = districts_coffee_shops_chain_rating['district_eng'].apply(lambda x: x.split(' ')[0].strip())
districts_coffee_shops_chain_rating
| district_eng | chain | rating | |
|---|---|---|---|
| 0 | Central | 0 | 4.40 |
| 1 | Central | 1 | 4.28 |
| 2 | Eastern | 0 | 4.35 |
| 3 | Eastern | 1 | 4.21 |
| 4 | North-Eastern | 0 | 4.28 |
| 5 | North-Eastern | 1 | 4.15 |
| 6 | North-Western | 0 | 4.49 |
| 7 | North-Western | 1 | 4.19 |
| 8 | Northern | 0 | 4.38 |
| 9 | Northern | 1 | 4.20 |
| 10 | South-Eastern | 0 | 4.32 |
| 11 | South-Eastern | 1 | 4.03 |
| 12 | South-Western | 0 | 4.32 |
| 13 | South-Western | 1 | 4.25 |
| 14 | Southern | 0 | 4.32 |
| 15 | Southern | 1 | 4.15 |
| 16 | Western | 0 | 4.24 |
| 17 | Western | 1 | 4.17 |
plt.figure(figsize=(14,7))
plt.title('Distribution of average ratings of chain and non-chain coffee shops by districts of Moscow',fontsize=15)
g = sns.barplot(data=districts_coffee_shops_chain_rating, x='district_eng', y='rating', hue='chain')
for p in g.patches:
g.annotate('{:.2f}'.format(p.get_height()), xy=(p.get_x() + p.get_width() / 2, p.get_height()), ha = 'center', va = 'center', xytext = (0, 10), textcoords = 'offset points')
plt.xlabel('Administrative districts of Moscow',fontsize=12)
plt.ylabel('Average rating',fontsize=12)
g.set_ylim(4, 4.55);
In each administrative district of Moscow, the rating of non-chain coffee shops is higher than the chain ones.
There are 1,413 coffee shops in total. As we know from the study of the distribution of foodservice places by districts of Moscow, in the Southern Administrative District, the number of places of which is comparable to the Northern and North-Eastern District, there are relatively fewer coffee shops. From the point of view of the share of coffee shops among all the places of the district, one can consider the Southern Administrative District of Moscow to open a coffee shop.
There is only one round-the-clock coffee shop in the South-Eastern and Southern district of Moscow. The largest number of 24—hour coffee shops is in the city center (26).
In Southern Administrative District, the average rating of coffee shops is slightly lower than in Moscow as a whole.
The cost of a cup of cappuccino in the Southern Administrative District is 165 rubles. We recommend focusing on the cost not lower than this.
In each administrative district of Moscow, the rating of non-chain coffee shops is higher than chain ones. In the Southern Administrative District, this difference is especially noticeable: the rating of non-chain places is on average 4.32 versus 4.03 for chain places.
We recommend opening a non-chain 24-hour coffee shop in the Southern Administrative District of Moscow with the cost of a cup of cappuccino from 165 rubles.
We have studied a dataset with information about 8,406 foodservice places in Moscow.
Prepared the dataset for work
The missing values were processed:
price column was filled in for chain places by analogy with other chain places located in the same administrative district;middle_avg_bill column was filled with the median value for places of the same category with the same price level and located in the same administrative district;middle_coffee_cup was filled in for chain coffee shops by analogy with other chain establishments that are located in the same administrative district.We added a column street with street names to the dataset , a column is_24/7 with the designation that the places is open daily and around the clock.
The dataset has the most cafes and restaurants. Restaurants, bars/pubs and coffee shops are in the lead in terms of the number of seats. There are 63.2% of non—chain places in the dataset, 36.8% of chain places.
Domino's Pizza and Dodo Pizza pizza chains are leading among the popular chains. The top 15 chains do not include pubs/bars, fast food places, canteens.
Most of the places from the top 15 chains are in the Central Administrative District of Moscow (11). Least of all — one — in the East. The Central Administrative District of Moscow is also the leader in the total number of places — 2,237.
The average rating of all categories of places exceeds 4 points. Bars/pubs have the maximum rating (4.39). The highest proportion of places with a rating below 4 points are fast food places (27%) and cafes (23%).
Mira Avenue is the leader in terms of the number of foodservice places in Moscow, 184 places are located on this street. There are 458 streets where there is only one foodservice place in Moscow. Most often the only place will be a cafe.
The median average bill in the Central Administrative District of Moscow (950 rubles) is significantly higher than in other districts. The cheapest places are located in the South-Eastern district (the average bill is 425 rubles.)
The highest percentage of round-the-clock places in the fast food category, the lowest in canteens.
There are 1,413 coffee shops in total. In the Southern Administrative District, the number of places of which is comparable to the Northern and North-Eastern districts, there are relatively fewer coffee shops.
There is only one round-the-clock coffee shop in the South-Eastern and Southern districts of Moscow. The largest number of 24—hour coffee shops is in the city center (26).
In Southern Administrative District, the average rating of coffee shops is slightly lower than in Moscow as a whole.
The cost of a cup of cappuccino in the Southern Administrative District is 165 rubles .
In each administrative district of Moscow, the rating of non-chain coffee shops is higher than chain ones. In the Southern Administrative District, this difference is especially noticeable: the rating of non-chain places is on average 4.32 versus 4.03 for chain places.
According to the results of a detailed study, it was recommended to open a non-chain 24-hour coffee shop in the Southern Administrative District of Moscow with the cost of a cup of cappuccino from 165 rubles.